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In  this  paper  we  discuss  new  developments  for  the  PSMG  multiscale  method,  which  we  have 
introduced  previously  as  an  efficient  PDE  solver  for  massively  parallel  architectures. 

After  an  overview  of  the  algorithm  we  introduce  the  fundamental  multiscale  recursion  relation,  as 
well  as  appropriate  Fourier  space  notation.  We  derive  the  multiscale  recursion  as  a  single  functional 
equation  without  reference  to  grids.  We  prove  a  sequence  of  rigorous  convergence  rate  bounds  which 
provide  increasingly  accurate  estimates  of  the  convergence  rate  for  translation  invariant  problems.  We 
show  that  in  constant  coefficient  situations  the  convergence  rates  for  the  method  may  be  derived  to  arbi¬ 
trary  precision,  and  we  develop  an  efficient  numerical  scheme  for  computing  such  rates.  Convergence 
rates  are  shown  to  be  faster  than  reported  previously.  We  present  estimates  for  the  normalized  work 
involved  in  PSMG  solution:  the  number  of  parallel  arithmetic  and  communication  operations  required 
per  digit  of  error  reduction.  The  work  estimates  show  that  the  algorithm  is  highly  efficient. 
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1.  OVERVIEW 

In  many  situations  the  most  efficient  algorithms  for  the  numerical  solution  of  large  sparse 
elliptic  problems  are  the  various  multi  grid  algorithms  [1-3].  Usually  these  methods  are  able  to 
compute  a  solution  with  N  unknowns  in  0(N)  operations,  asymptotically  faster  than  most 
other  algorithms.  Efficient  parallel  implementations  of  multigrid  algorithms  have  been 
reported  on  both  SIMD  and  MIMD  parallel  computers[4-14].  Most  of  these  methods  are 
designed  for  moderately  parallel  systems,  where  each  processor  will  hold  many  degrees  of 
freedom. 

The  PSMG  method,  introduced  inf  15],  provides  a  parallel  algorithm  appropriate  to  the 
case  where  the  number  of  processors  is  comparable  to  the  number  of  degrees  of  freedom. 
Related  ideas  have  been  introduced  by  Hackbusch[16].  As  remarked  in  [17],  PSMG  is  an 
example  of  an  intrinsically  parallel  algorithm.  It  is  highly  efficient  if  sufficient  processors  are 
available,  but  is  extremely  inefficient  on  serial  or  low-parallelism  computers.  In  situations 
where  there  are  substantially  more  fine  grid  points  than  processors,  an  efficient  approach  might 
use  a  hybrid  algorithm— using  standard  multigrid  on  the  finest  grids,  but  switching  to  PSMG 
on  grid  levels  where  the  number  of  processors  approximates  or  exceeds  the  number  of  grid 
points. 

A  brief  but  complete  description  of  the  PSMG  algorithm  is  presented  in  Sections  2  and  3 
below.  Section  4  introduces  the  underlying  multiscale  recurrence  relation  which  is  used 
throughout  the  paper.  Section  5  specializes  to  translation  invariant  problems  and  introduces 
notation  in  Fourier  transform  space.  In  section  6  we  show  that  a  single  functional  equation  on 
the  unit  square  is  equivalent  to  the  PSMG  recursion  formula  across  grid  levels,  greatly  simpli¬ 
fying  the  analysis  of  PSMG  convergence.  Section  7  presents  proofs  of  basic  convergence  rate 
bounds  which  show  that  the  PSMG  method  actually  converges,  uniformly  in  the  grid  size.  Sec¬ 
tion  8  discusses  the  numerical  evaluation  of  convergence  rates  and  develops  an  0(N)  algo¬ 
rithm  for  such  evaluations.  In  section  9  we  discuss  methods  for  estimating  the  parallel  work 
required  by  PSMG.  Section  10  discusses  actual  PSMG  performance,  comparing  the  work  per 
digit  of  convergence  achieved  across  several  different  algorithms. 
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2.  THE  BASIC  IDEA 

Consider  a  simple  discretization  problem  on  a  1 -dimensional  grid.  Standard  multi  grid 
techniques  work  with  a  series  of  coarser  grids,  each  obtained  by  eliminating  the  odd  numbered 
points  of  the  previous  grid.  The  error  equation  for  the  fine  grid  is  then  projected  to  the  coarse 
grid  at  even  numbered  points,  the  coarse  grid  equation  is  solved  approximately,  and  the  error  is 
interpolated  back  to  the  fine  grid  and  added  to  the  solution  there.  Finally  a  smoothing  opera¬ 
tion  is  performed  on  the  fine  grid.  Recursive  application  of  this  procedure  defines  the  complete 
multigrid  procedure[l,  3]. 

The  basic  idea  behind  PSMG  is  the  observation  that  for  each  fine  grid  there  are  two 
natural  coarse  grids  -  the  even  and  odd  points  of  the  fine  grid.  (For  simplicity  we  assume  that 
periodic  boundary  conditions  are  enforced).  Either  of  these  coarse  grids  could  be  used  at  any 
point  to  construct  the  coarse  grid  solution,  and  both  would  presumably  provide  approximately 
equivalent  quality  solutions.  Thus  it  ought  to  be  possible  to  find  a  combination  of  the  two  solu¬ 
tions  that  is  significantly  better  than  either  separately.  It  would  follow  immediately  that  such  a 
scheme  would  converge  faster  (fewer  iterations)  than  the  corresponding  standard  multigrid 
scheme.  Note  that  on  a  massively  parallel  machine  the  two  coarse  grid  solutions  may  be 
solved  simultaneously,  in  the  same  time  as  one  of  them  would  take  -  we  assume  here  that  the 
number  of  processors  is  comparable  to  the  number  of  fine  grid  points.  Both  coarse  grid  prob¬ 
lems  are  solved  using  the  same  set  of  machine  instructions.  Consequently  the  algorithm  is  well 
suited  to  SIMD  parallel  computers,  as  well  as  to  MIMD  machines. 

The  idea  outlined  above  extends  naturally  to  multi-dimensional  problems.  In  d  dimen- 

j 

sions,  2  coarse  grids  are  obtained  from  a  fine  grid  by  selecting  either  the  even  or  the  odd 
points  in  each  of  the  d  coordinate  directions.  The  fine  grid  solution  is  then  defined  by  perform¬ 
ing  a  suitable  linear  interpolation  of  all  2d  coarse  grid  points. 


3.  THE  PSMG  ALGORITHM 

The  PSMG  algorithm  works  with  a  single  grid  of  points  G  ^  of  size  n  =  n{L)  =  2L  in 
each  dimension  (called  the  level  L  grid,  or  the  fine  grid),  but  utilizes  operators  with  different 
scales  l  <L  on  that  grid.  Thus  the  algorithm  is  strictly  speaking  multiscale  rather  than  mul¬ 
tigrid.  There  are  three  basic  operators:  a  finite  difference  operator  A ,  an  interpolation  operator 
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Q  and  a  smoothing  operator  S  -l-ZA .  All  operators  are  periodic  on  the  grid  in  each  coordi¬ 
nate  direction.  The  PSMG  algorithm  extends  naturally  to  both  Neumann  and  Dirichlet  boun¬ 
dary  conditions,  with  no  increase  in  convergence  rate.  The  simplest  approach  to  implementing 
Neumann  or  Dirichlet  boundary  conditions  is  to  use  reflection  or  anti-reflection  boundary  con¬ 
ditions  and  an  extended  grid. 

The  operators  at  scale  level  /,  denoted  A^l\  Q^l\  and  Z^l\  couple  points  at  a  distance 
di  =2L~l .  Each  level  /  operator  is  defined  at  all  points  of  the  grid  G(L\  The  basic  steps 
involved  at  level  / ,  0 </<L ,  for  the  solution  of  A^U  -f ,  starting  with  an  initial  guess  u ,  with 
error  e  and  residual  r ,  are  described  by: 

Algorithm  PSMG(I,u,f): 

1.  Compute  residual:  r  =A('l')e  =/  —  A^u 

2.  Project  residual  to  coarse  grid:  r  -  r  (trivial  injection). 

3.  Solve  coarse  grid  residual  equation  using  PSMG:  e'  =  PSMG(/-l,0,r) 

4.  Interpolate  to  fine  grid:  e"  =  Q^e' 

5.  Apply  a  relaxation:  e"'  =  (I-Z^A^)e"  +  Z^r 

6.  Compute  and  return  the  new  solution:  u'"  =  u  +  e 


An  exact  solver  is  utilized  instead  of  PSMG  on  the  coarsest  grid  in  step  3.  As  indicated  in  step 
3,  the  initial  guess  for  all  coarse  grids  is  taken  to  be  0.  Consequently  steps  1  and  6  are  required 
only  on  the  finest  grid.  The  process  as  described  depends  on  an  explicit  choice  for  and 
Z^\  The  PSMG  strategy  is  to  choose  Q (/)  and  as  functions  of  in  such  a  way  as  to 
optimize  the  convergence  rate  of  the  above  algorithm.  Explicit  choices  for  Q  ^  and  Z ^  are 
given  in[  15, 18],  and  in  Section  10,  for  the  cases  where  A^  represents  either  the  standard  5- 
point  or  Mehrstellen  discretizations  of  the  Laplacian. 
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4.  THE  PSMG  RECURRENCE  EQUATION 

We  note  that  a  two-grid  PSMG  algorithm  may  be  described  in  the  form:  r^1  ^  -  T^  r  , 
where  the  two-grid  residual  reduction  operator  T^  is  determined  by: 

t(L)  =  S(L)^  _  A(L)  Q(L)A(L-iyi  ^ 

We  define  the  two-grid  convergence  rate  %  of  this  iteration  procedure  as  the  quantity: 

x  =  sup  |  |T(L)|  |  .  Clearly  t  provides  a  bound  on  the  convergence  rate  of  the  two-grid 
L 

method  on  any  grid. 

We  obtain  an  equation  for  the  residual  reduction  operator  of  the  full  PSMG  algorithm  by 
recursive  application  of  the  two-grid  algorithm  described  above.  The  corresponding  residual 
reduction  then  takes  the  form:  r^M^  =  r  ,  where  the  multi-grid  iteration  operator  is 
determined  by: 

M(0  =T(0  +  (S(i)_T«>)M«-l)  ,  l=L,  ■■■  ,  1  , 

with  M(<l)  =  0.  We  define  the  multigrid  convergence  rate  of  this  procedure  as  the  quantity: 

j 1  -  sup  sup  I  |M(/)|  I  . 

L  l  <L 

Clearly  j i  provides  a  bound  on  the  convergence  rate  of  PSMG  on  any  grid  G  ^L\ 


5.  FOURIER  MODE  ANALYSIS 

In  order  to  complete  the  description  of  the  PSMG  algorithm  it  is  essential  to  define  the 
operators  Q  ^  and  Z ^  used  for  interpolation  and  smoothing.  In  this  section,  we  describe  suit¬ 
able  classes  of  and  Z ^  for  the  special  case  of  an  operator  which  has  translation  invariant 
coefficients.  We  will  illustrate  the  ideas  for  the  Poisson  equation  discretized  on  a  periodic  rec¬ 
tangular  grid  of  N  =  nxn  points,  n- 2L ,  which  we  label  with  the  index  i  =(/1?  i2), 
0  <  i  1?  i2  <  n  •  We  will  use  two  discretizations  of  the  negative  Laplacian  -A.  The  first  of  these 
is  the  standard  five-point  discretization  defined  by 

(APu)i  =  h,~2  (  4w;  -  M;_eJ  -ui+e,  -ui+e,2  )  , 

where  e\  are  integer  vectors  of  length  =  2L~l  in  the  coordinate  directions  in  index  space,  or 
alternatively  by  the  familiar  five-point  star  notation: 
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-1 

AP  =  hr 2  -14-1  . 

-1 

The  second  discretization  we  will  study  is  the  more  accurate  Mehrstellen  discretization 
represented  by  the  nine-point  star 

-1  -4  -l" 

AP  =  (6/z;2)'1  -4  20  -4  . 

-1  -4  -1_ 

Similarly,  we  will  choose  the  operators  and  to  be  defined  by  simple  symmetric  three 
parameter  nine-point  star  operators  (with  appropriate  scale  length  depending  on  /): 

Q\\  Q\\  zn  zi  zn 

Q(l)  =  q  1  Qo  Q\  »  z(0  =  hi2  zl  z0  z1  , 

q  n  q  i  q  it  Z11  Z1  Z11 

or  equivalently  in  operator  notation: 

{Q{l)u)i  =  q0ut  +  qx  (tq+e;  +wi_e;  +ui+ei2  +wt-_e/2)  + 

q  11  ^i+e[+e2  ^i-e'i+e 2  ^i+e[-e 2  +  ^i—e[ -el2^  5 

with  a  similar  expression  for  Z^u.  For  simplicity,  we  take  the  parameters  qx  and  z,-  to  be 
independent  of  the  scale  parameter  / . 

Since  all  of  these  operators  are  translation  invariant,  they  are  diagonalized  by  the  discrete 
Fourier  transform.  The  analysis  of  the  PSMG  algorithm  then  becomes  particularly  convenient. 
In  the  following  we  work  entirely  in  Fourier  transform  space,  where  each  of  the  operators  A 
Q(yl^  and  will  reduce  to  multiplication  by  a  trigonometric  function.  In  terms  of  the  two- 
dimensional  discrete  Fourier  transform  on  G^: 

n— 1 

uk  =  n~l  ]T  el2nj'k/nUj  ,  0<kl,k2<n  , 

and  the  notation  xp  =  cos(27 ikidi/n)  ,  /  =  1,2  ,  the  operators  A(/)  and  Q reduce  to  multi¬ 
plication  by  the  trigonometric  polynomials: 

Ail)  =hr2(4  -  2(xP  +xP))  , 

Ail)  =(6 h,2Tl  (20  -  8(jcfn  +xP)  -  AxPxP)  , 
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<2(;)  =  <7o  +  2<?1(x,(i)  +xP )  +  4<7llx1(,)xf  , 

with  a  similar  expression  for  in  terms  of  parameters  z0,  zx  and  zn.  Application  of  the 
two-grid  iteration  operator  over  the  grid  reduces  to  multiplication  by  the  function: 

where  S(/\  =  1  and  C^l\  =  1  -  are  the  Fourier  representations 

of  the  smoothing  and  coarse-scale  correction  operators,  respectively. 

For  both  the  Laplace  and  Mehrstellen  case,  C^l\  has  apparent  poles  at  the  four  points 
\xP  |  =  \xP  |  =  1,  which  are  the  zeroes  of  the  coarse  grid  difference  operator.  The  pole  at 
xp  =xP  =  1  is  canceled  by  a  corresponding  zero  in  the  numerator,  but  this  is  not  so  for  the 
other  three  poles.  We  cancel  the  remaining  zeroes  in  the  denominator  by  carefully  choosing 
the  three  parameters  qt .  It  is  easily  checked  that  the  two  conditions: 

Q i  =  <7o/2  ,  <7n=<7o/4  , 

suffice,  leaving  one  free  parameter  q0  in  the  Q  operator  to  be  chosen  later. 

A  further  constraint  is  necessary  if  the  multigrid  iteration  operator  is  to  have  a  bound 
independent  of  /,  namely  the  constraint  that  the  coarse  grid  correction  operator  vanish  at 
the  origin  in  frequency  space.  This  constraint  leads  to  the  condition  qo=.25  on  the  interpola¬ 
tion  operator  Q  ^  which  is  therefore  uniquely  determined.  The  resulting  form  of  is  then: 

L-X  j  -X  2 

for  the  five-point  operator,  with  a  similar  expression  for  the  nine-point  operator.  Note  that  the 
same  restrictions  on  Q  are  required  for  the  five  and  nine-point  operators. 

To  get  an  improved  convergence  rate  we  have  also  used  a  25-point  star  operator  for  Q  : 

Q  22  Q  12  0.  2  Q.  12  0.  22 
Q 12  <7n  Qi  Qn  Qn 
Q  =  <72  <7i  <7o  <7i  Q 2  • 

<712  1  li  <7i  <7 n  *7 12 
Q  22  Q  12  Q  2  Q  12  Q  22_ 

Again  we  compute  an  explicit  rational  function  expression  for  the  coarse-scale  operator  CP  as 
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a  function  of  the  trigonometric  variables  xfl\  As  before,  there  are  three  poles  of  the  denomina¬ 
tor  in  which  we  cancel  by  careful  choice  of  Q ,  leading  to  the  two  constraints: 

0  =  qo-4ql+4qn  +  4q2  +  4q22-$>qn  , 

0  =  ^o-4^u  +  4^2  +  4^22  • 

We  must  add  the  constraint 

1  =<7o  +  4<7i  +4#  n  +  4^2  +  4^22  +  ^12  > 

required  to  ensure  that  the  coarse  grid  correction  operator  vanishes  at  the  origin  in  fre¬ 
quency  space.  It  follows  that  there  are  3  independent  coefficients  of  Q,  along  with  the  3 
parameters  of  Z ,  that  are  available  to  minimize  p  in  the  multigrid  analysis. 

In  the  translation  invariant  case  all  of  the  operators  in  the  recurrence  equations  for  the 
multigrid  residual  reduction  operator  introduced  in  the  previous  section  commute.  Furth¬ 
ermore  is  also  then  the  error  reduction  operator.  is  therefore  determined  recursively 
as  a  sum  of  products  of  low-degree  rational  functions  in  the  xp\  Since  is  a  multiplication 
operator  in  the  translation  invariant  case,  its  norm  is  the  maximum  value  of  |  ^  j  evaluated 

over  all  relevant  frequencies  /q,  or  equivalently,  evaluated  over  those  discrete  points  in  the 
square  -1  <  x  ,x^  <1  that  correspond  to  Fourier  frequencies  on  the  grid  G^L  \ 

Translation  invariant  problems  will  have  an  inherent  singularity  if  constant  functions  are 
in  the  null  space  of  the  differential  operator  A .  In  Fourier  space  this  results  in  bad  behavior  at 
the  origin  k=( 0,0).  The  singularity  must  be  dealt  with.  One  solution  is  to  omit  constant  func¬ 
tions  and  zero  frequency  from  the  domain  of  interest.  Alternatively  one  may  include  these 
functions  by  using  the  Moore-Penrose  pseudo-inverse [19, 20]  to  extend  operators  from  the 
space  of  non-constant  data.  This  latter  choice  is  equivalent  to  defining  M^(0,0)  =  1  in  the 
basic  recurrence  relation. 


6.  A  FUNCTIONAL  EQUATION  FOR  M 

A  remarkable  feature  of  PS  MG  for  translation  invariant  PDE  without  lower  order  terms 
(such  as  the  Poisson  equation)  is  that  we  can  phrase  the  defining  equation  for  the  iteration  in  a 
form  completely  independent  of  grids  or  grid  levels.  The  resulting  equation  is  a  simple 
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functional  equation  on  the  space  of  trigonometric  polynomials.  The  key  point  in  deriving  this 
relationship  is  to  make  a  transition  from  unbounded  frequency  variables  ( k )  to  uniformly 
bounded  angle  (0)  or  cosine  (. x  =  cos(0))  variables. 

In  previous  sections,  all  operators  have  been  explicitly  shown  as  dependent  on  the  grid 
G  ^  ^  and  the  level  / ,  as  well  as  on  frequency  kt .  We  begin  with  a  lemma  which  shows  that  the 
kernels  of  these  operators  are  actually  independent  of  L  and  need  not  be  computed  at  even  fre¬ 
quencies  k . 


Lemma  1:  Each  of  the  operators  O  =S^l\  T^,  defined  on  G  ^  has  a  kernel  0 ^ 
independent  of  L ,  and  satisfying  0$  = 


Proof:  From  section  5  we  note  that  the  only  explicit  /  or  L  dependence  in  the  operators  S,  T 
is  in  the  trigonometric  functions  =  cos (2nk/2l).  Factors  of  h{  involved  in  the  operators  A 
and  Z  always  cancel  out  provided  there  are  no  first  order  or  constant  terms  in  the  difference 
operator.  Thus  the  operator  kernels  depend  explicitly  on  /  but  not  on  L .  Furthermore  we  note 
that  =  cos(27t  2kl2l)  =x^l~l\  and  consequently  we  have  the  result  of  the  lemma  for  S  and 
T. 

We  prove  the  result  for  M  by  induction  on  / .  For  /= 1  we  apply  the  recurrence  along  with 
=  1,  obtaining  M^  =  S^,  and  so  the  lemma  applies  for  1=1.  Now  assume  the  lemma 
applies  to  Then 

M&>  =  T$  +  (S$  -  T$  )M&-*> 

=  rb'“1>  +  (S|'“1)-Ti'_l))Mf“2) 

=  . 


This  completes  the  induction  proof. 
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As  a  consequence  of  the  lemma,  we  may  write  the  recurrence  as  a  single-level  functional 
equation: 

Mi,)  =  T|')  +  (Si'>  -  Tf)M$  . 

We  now  regard  M(  1  as  a  function  of  the  normalized  variables  xt  =  cos(27t k-t  In )  rather  than  of 
k{.  These  variables  span  the  range  [—1,1]  as  k-t  and  n  vary.  Alternatively  we  may  introduce  the 
variables  0f-  =  Ink-Jn  and  regard  M  as  a  function  M(0)  on  [0,2tc]x[0,2tc].  We  denote  by  DQ  the 
set  of  values  (0i,02)  where  0  <  kt  <  n ,  for  any  n  of  the  form  2/ ,  and  where  the  origin,  k  i=k2=0, 
is  omitted.  We  denote  the  corresponding  set  of  values  of  x  by  Dx.  Thus  all  of  the  values  of 
for  all  grids  G^L\  are  values  of  a  single  function  M(0),  0=(0l502)  in  D0,  or  equivalently 
of  a  single  function  M(x),  x=(xlrx2)  in  Dx-  Similarly  the  smoothing  and  two  grid  kernels 
extend  into  functions  S(0)  and  T(0)  defined  over  D0,  or  S(x)  and  T(x)  defined  over  Dx,  and 
the  multiscale  recursion  relation  reduces  to  the  form: 

M(0)  =  T(0)  +  (S(0)-T(0))  M(20) ,  M(0)  =  1  , 

or  equivalently 

M(jc)  =  T(x)  +  (S(x)-T(^))M(2x2-1),  M(l)  =  l, 

where  we  have  used  the  fact  that  cos(29)  =  2cos2(9)  -  1.  We  will  use  these  functional  equa- 
tions  in  the  following  section.  The  choice  of  M(l)  =  1  was  explained  at  the  end  of  the  previous 
section,  and  effectively  defines  the  Moore-Penrose  pseudo-inverse  on  constant  functions. 


7.  CONVERGENCE  RATE  BOUNDS:  STATEMENT  AND  PROOFS 

In  this  section  we  will  derive  rigorous  upper  bounds  on  the  multigrid  convergence  rate  of 
the  PSMG  algorithm  in  the  translation  invariant  case  described  in  the  previous  section. 


Theorem  1:  Define  pi1  =  sup  \  T(x)  |  /  (1  -  |  S(x)  -  T(x)  | ).  Assume  that  the  smoother 

satisfies  |S|  <  fij  at  the  three  high  frequency  points  (-1,1),  (1,-1)  and  (-1,-1).  Then  the  mul¬ 
tigrid  convergence  rate  \i  satisfies  the  bound  ]i  <  p^. 
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Proof:  Let  P3  denote  the  set  {(1,-1),  (-1,1),  (-1,-1)J.  We  will  use  a  mapping  g:R2  R2 
defined  by  g(xlyx 2)  =  (2x  2-l,  2x2~l)-  We  decompose  Dx  into  a  union  of  disjoint  subsets 
Dn  n> 0,  where  we  define  recursively: 

Dn  -  {x<=Dx  :  x  not  e  Dn_x  and  g  (x)  e  Dn_x}  ,  D0  =  P3. 

Dn  consists  of  the  points  of  Dx  which  correspond  to  angles  in  D  e  of  the  form  Ink  /2n  with  k 
odd.  We  will  prove  Theorem  1  holds  for  the  sets  Dn  by  induction  on  n . 

The  induction  hypothesis  holds  for  D0  =  P3.  To  see  this  we  note  that  for  each  point 
peP3,  g(p)  =  (  1,1)  and  since M (1,1)  =  1,  the  recursion  relation  reduces  at  these  points  to 

M  (p)  =  T(p )  +  (Sip  )-T(p ))  •  1  =  Sip)  ,  peP3. 

Thus  by  the  assumption  of  the  theorem  that  |S(p)|  <  for  peP3,  we  conclude  that 
|  M(p )  |  <  jLij,  and  thus  that  the  induction  hypothesis  holds  for  D  0. 

Now  assume  the  result  is  true  for  Dn_x  and  let  xeDn .  Since  g  (x)eDn_l,  it  follows  that: 
|M(x)l  ^  |T(x) |  +  |S(x)-T(x)|  | M(Zx2 — 1) [ 

<(l-|S(x)-T(x)|)|i1  +  |  S(x  )-T(x )  |  (i! 

=  Hi- 

which  completes  the  proof  of  the  induction  hypothesis  for  Dn . 


Remarks:  The  assumption  on  S  in  the  statement  of  Theorem  1  is  a  requirement  that  S  be  small 
at  high  frequencies,  which  is  essential  anyway  if  S  is  to  be  a  good  smoother.  Thus  it  is  not  a 
major  restriction.  However  the  assumption  can  be  avoided  by  terminating  the  PSMG  iteration 
at  the  level  1  grid  rather  than  at  level  0.  In  that  case  an  exact  solution  is  performed  on  level  1 , 
so  that  M  is  0  on  the  set  P3.  We  also  remark  that  the  theorem  does  not  require  that  the  dif¬ 
ferential  operator  be  homogeneous,  as  required  to  derive  the  grid  independent  recursion  rela¬ 
tion.  The  theorem  is  easily  extended  by  the  same  proof  to  the  case  where  M(x)  is  dependent 
on  the  level  / .  The  theorem  does  however  require  translation  invariance. 
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The  bound  described  above  was  used  in[15]  to  design  good  choices  for  Q  and  Z.  In  fact  the 
bound  jo.1  requires  only  that  we  find  the  suprenum  of  a  rational  function  on  the  unit  square.  One 
can  estimate  jLLj  by  evaluation  of  the  suprenum  over  a  large  grid  of  values  in  the  unit  square. 
The  resulting  quantity  is  then  used  as  an  objective  function  and  minimized  with  respect  to  the 
parameters  defining  Q  and  Z . 

Sharper  convergence  criteria  result  if  one  first  applies  the  iteration  formula  for  M(x) 
several  times.  It  is  notationally  convenient  to  write  M  as  a  function  of  the  angles 
0  <  0j  =  2t ik-Jn  <  27t.  We  also  define  R(9)  =  S(8)-T(0).  The  recurrence  relation  then  reduces 
to: 

M(0)  =  T(0)  +  R(0)M(29),  M(0)  =  1  . 

Iterating  v-1  times  we  arrive  at: 

M(0)  =  T(0)  +  R(0)T(20)  +  R(0)R(20)T(40)  +  ■  •  • 

+  R(0)R(20)  •  •  ■  R(2V_10)T(2V0)  +  R(0)R(20)  •  •  •  R(2V0)M(2V+I0) 
s  Tv(6)  +  RV(0)M(2V+18)  . 


This  again  has  a  similar  form  to  the  original  recurrence  equation,  and  we  prove  exactly  as  in 
the  previous  theorem: 


Theorem  2:  Define  =  sup  |TV(0)|  /(I  -  |RV(0)|)  .  Assume  that  the  smoother  satisfies 

e 

|  S  |  <  \iy  at  the  three  high  frequency  points  (-1,1),  (1,-1)  and  (-1,-1).  Then  the  convergence 
rate  satisfies  the  bound  p  <  fly. 


It  is  important  to  take  care  in  evaluating  the  bounds  Py.  Even  if  Rv(9)  approaches  1  at  some 
points,  the  bound  may  still  be  useful  provided  the  numerator  Tv(0)  also  vanishes  at  the  same 
points.  Indeed  this  situation  occurs  at  the  origin  even  for  p1.  In  evaluating  |TV|  it  is  also 
important  to  take  the  absolute  value  only  after  the  various  terms  have  been  added  together. 
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These  convergence  results  may  be  extended  to  provide  various  bounds  on  derivatives  of 
M(0),  and  in  particular  to  show  that  with  appropriate  conditions  imposed,  p,  may  be  approxi¬ 
mated  arbitrarily  closely  by  taking  the  suprenum  of  |  M(0)  |  on  a  suitably  fine  mesh  of  points  0. 
For  further  details  we  refer  to[21]. 

For  illustration  we  provide  here  results  obtained  on  grids  with  edges  of  size  n{L)  from  4 
through  1024  using  each  of  the  bounds  jilv.,  ji5.  The  final  column  is  the  exact  norm  as  com¬ 
puted  using  the  full  iteration  formula.  The  results  are  for  the  case  of  a  Mehrstellen  operator 
with  9-point  Q  and  Z  stencils  -  the  coefficients  of  Q  and  Z  are  given  in  the  last  section  of  the 
paper  under  PSMG9-9. 


TABLE  1:  CONVERGENCE  BOUNDS 


n(L) 

Pi 

P2 

P3 

p4 

P5 

P 

4 

.0009 

.0049 

.0049 

.0049 

.0049 

.0049 

8 

.0248 

.0174 

.0174 

.0174 

.0174 

.0174 

16 

.0264 

.0216 

.0216 

.0216 

.0216 

.0217 

32 

.0291 

.0230 

.0216 

.0216 

.0216 

.0217 

64 

.0291 

.0241 

.0216 

.0216 

.0216 

.0217 

128 

.0292 

.0241 

.0225 

.0216 

.0216 

.0217 

256 

.0292 

.0243 

.0224 

.0220 

.0217 

.0217 

512 

.0292 

.0243 

.0226 

.0220 

.0219 

.0217 

1024 

.0292 

.0243 

.0226 

.0221 

.0219 

.0217 

8.  COMPUTATION  OF  CONVERGENCE  RATES 

In  our  paper[15]  we  have  used  the  first  theorem  of  the  previous  section  to  provide  upper 
bounds  on  PSMG  convergence  rates  and  to  guide  thereby,  optimization  processes.  Basing  an 
optimization  process  for  a  rate  on  optimization  of  a  possibly  crude  upper  bound  for  that  rate  is 
certainly  not  ideal.  More  recently[18],  we  have  improved  our  knowledge  of  PSMG  conver¬ 
gence  rates  by  exactly  computing  the  spectral  radius  | of  the  self-adjoint  PSMG  multigrid 
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error  reduction  operator  for  all  grids  G^  with  L  ranging  from  0  to  11.  Here  G^  is  a 
square  grid  with  n  =2L  points  on  a  side.  The  verification  is  based  on  the  iterative  formula: 

M(/)  =  T(0  +  (S<z>  -  T(/))  M(/-1) ,  1  </<L, 

for  the  multi  grid  operator  \  The  correct  domain  for  the  translation  invariant  Poisson  equa¬ 
tion  is  the  set  of  grid  functions  orthogonal  to  constants.  In  Fourier  space  this  means  that  points 
aliased  to  (0,0)  are  omitted  from  the  computation.  Equivalently  one  defines  M^0)  =  1  in  (1). 

We  compute  by  evaluating  the  recurrence  above  from  1=0  to  l=L  in  Fourier  space 
for  every  frequency  pair  kh  k2  appropriate  to  G^  -  i.e.  for  0  <  &,•  <  n.  The  only  approxima¬ 
tion  in  this  procedure  is  that  the  kernels  are  evaluated  in  double  precision  rather  than  infinite 
precision  arithmetic. 

At  first  sight  it  would  appear  that  the  evaluation  of  p/L)  would  require  O  (/?2logL)  opera¬ 
tions  -  the  number  of  grid  points  for  level  L  times  the  cost  O  (log  L )  of  an  evaluation  of  the 
recurrence.  Remarkably,  in  Fourier  space  only  0(n 2)  operations  are  required.  The  idea  is  to 
use  the  previously  derived  fact  that  The  value  of  is  independent  of  which 

grid  level  /  it  is  evaluated  at  (assuming  frequency  k  exists  at  that  level).  Therefore  we  develop 
an  inverse  process  where  we  first  evaluate  M  at  all  coarse  grid  points,  then  evaluate  M  on  the 
next  finer  grid  but  only  at  those  points  not  in  the  coarse  grid.  Furthermore  one  can  use  sym¬ 
metry  in  klt  k 2  and  between  kt  <n/2  and  k-t  >n/2  to  reduce  the  work  by  a  further  factor  of  8. 
We  store  all  evaluated  on  a  given  level,  so  that  the  recursion  on  the  following  level  can  be 
terminated  after  one  step.  It  is  this  last  fact  that  reduces  the  overall  cost  by  log  L .  Given  that 
the  result  of  a  norm  evaluation  is  a  single  data  point  in  the  optimization  of  the  choice  of  Q  and 
Z,  the  ability  to  have  such  a  fast  evaluation  of  the  norm  of  M  is  the  key  to  obtaining  highly 
efficient  PSMG  processes  in  both  2D  and  3D. 


9.  PARALLEL  OPERATION  COUNTS 

In  this  section  we  discuss  methods  for  estimating  the  work  performed  in  a  PSMG  itera¬ 
tion.  We  use  a  model  of  parallel  computation  introduced  in [22].  Thus  we  assume  only  nearest 
neighbor  communication  with  four  neighbors,  purely  SIMD  communication  and  computation, 
and  we  consider  separately  each  grid  level.  The  operation  counts  will  be  those  for  intermediate 
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grids  and  the  communication  unit  is  grid  level  dependent  -  the  cost  for  communication  between 
"nearest  neighbors"  at  that  grid  level. 

The  PSMG  algorithm  was  described  as  6  separate  steps  in  section  3  above  to  which  we 
now  refer.  For  intermediate  grids  (0</  <L)  the  initial  guess  is  taken  to  be  0  so  that  step  1  is  not 
needed.  Similarly  step  6  is  relevant  only  to  the  top  level  grid  L .  Step  2  is  free  because  PSMG 
uses  injection,  while  step  3  is  counted  at  level  /-I  or  lower.  Thus  only  steps  4-5  are  to  be 
counted  at  level  /.  Apart  from  the  3  operators  involved,  we  note  that  no  communication  is 
required  in  these  steps  while  two  computations  are  required  in  step  5.  Therefore  we  conclude 
that  at  intermediate  grid  levels  (0 </  <L),  the  parallel  communication  and  computation  costs  for 
the  PSMG  algorithm  are  given  by: 

comm  ( PSMG  (/ ))  =  comm  (Q )  +  comm  (A  )  +  comm  (Z )  , 
comp  ( PSM G  (/ ))  =  comp  (Q )  +  comp  (A )  +  comp  (Z )  +  2  , 

where  comm  ( O )  and  comp  ( O )  are  the  number  of  parallel  communication  and  computation 
operations  required  to  execute  operator  O . 

In  Table  2  below  we  present  the  operation  counts  for  each  of  the  individual  operators 
encountered  in  the  four  PSMG  algorithms  corresponding  to  choice  of  5  or  9  point  A  and  to  9  or 
25  point  Q  stencils  (often  referred  to  as  the  PSMG5-9,  PSMG9-9,  PSMG5-25  and  PSMG9-25 
methods).  For  full  details  of  the  derivation  of  these  operation  counts  we  refer  to[18]. 


TABLE  2:  OPERATION  COUNTS  FOR  A,Q,Z 

Operator 

Comp.  Steps 

Comm.  Steps 

5-ptA  (Laplacian) 

3 

4 

9-ptA  (Laplacian) 

5 

4 

9-pt<2  (Interpolation) 

4 

4 

25-pt  Q  (Interpolation) 

12 

8 

9-ptZ  (Relaxation) 

5 

4 
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10.  PSMG  PERFORMANCE 

An  effective  measure  of  algorithm  performance  depends  on  several  factors  besides  con¬ 
vergence  rate.  For  parallel  environments,  measures  of  parallel  work  and  of  communication 
need  to  be  incorporated.  Furthermore  one  needs  to  be  precise  about  the  underlying  model 
machine  specification.  As  a  model  of  parallel  computation  we  have  followed  [22],  as  discussed 
in  the  previous  section. 

If  the  asymptotic  convergence  rate  of  a  method  is  p  and  the  method  requires  w  parallel 
operations  per  iteration,  the  normalized  operation  count  is  defined  as  vWlog10p,  and  measures 
the  parallel  work  required  per  grid  level  to  reduce  the  error  by  a  factor  of  10.  We  consider 
parallel  communication  and  computation  operations  separately. 

For  several  PSMG  methods  we  present  asymptotic  convergence  rates,  the  number  of 
parallel  arithmetic  and  communication  operations  required  on  each  grid  per  iteration,  and  also 
the  normalized  operation  count  for  arithmetic  and  communication.  We  summarize  the  results 
for  several  simple  cases  in  Table  3.  For  each  cycle  we  use  a  single  relaxation  with  a  9-point  Z 
stencil.  The  cases  are  labeled  PSMGa-g  where  a  and  q  are  the  stencil  size  (5  or  9)  of  the 
difference  operator  a  and  of  the  q  matrix  (9  or  25)  respectively. 


TABLE  3:  PSMG  CONVERGENCE  RATES 

Convergence 

Steps  per  Level 

Normalized  Steps 

Method 

Rate 

Comp. 

Comm. 

Comp. 

Comm. 

PSMG  5-9 

.08867 

14 

12 

13.31 

11.40 

PSMG  5-25 

.02504 

22 

16 

13.74 

9.99 

PSMG  9-9 

.02165 

16 

12 

9.61 

7.21 

PSMG  9-25 

.00165 

24 

16 

8.62 

5.75 

The  corresponding  coefficients  for  the  interpolation  operator  Q  and  the  smoothing  operator  Z 
are  (in  the  notation  of[  1 8] ): 


PSMG5-9: 

<7o=-25 

<h=.125 

<7  **=.0625 

z0=.278079 

z^.0534577 

zn=0125615 

PSMG5-25: 

<7o=.361017 

<7i=.11458 

<7H=.0625 

#2=-.0309162 

<7 12=. 0052 1024 

^22=.00316188 

z0=.361452 

z1=.0891718 

zn=. 0293793 

PSMG9-9: 

q0=25 

qi=A25 

<7n=.0625 

z0=.300589 

z  *=.0432465 

zn=0139994 

PSMG9-25: 

<7o=.34152 

<7  *=.0995677 

<7U=0625 

<^=-.0199225 

<712=.0127161 

q22=- .002951 55 

z0=.283286 

z*=0323815 

zn=00835795 

The  convergence  rates  given  in  Table  3  are  the  maximum  values  of  JJ,^  for  0<L<11  and 
therefore  bound  the  exact  convergence  rates  for  all  grids  up  to  size  4  million  points.  In  practice 
we  find  that  the  convergence  rates  are  unchanged  to  several  digits  of  precision  beyond 
about  level  L=6.  We  therefore  are  confident,  that  the  convergence  rates  in  Table  3  extend  to 
arbitrary  numbers  of  grid  levels.  As  a  final  check  we  have  solved  the  Poisson  equation  using 
PSMG,  with  zero  right  hand  side  and  a  random  initial  guess,  and  in  each  case  verified  the  con¬ 
vergence  rates  of  Table  3. 

As  a  comparison  point  we  note  that  the  PSMG9-25  algorithm  uses  about  half  as  much 
computation  and  close  to  one  fifth  as  much  communication  as  the  best  standard  red-black  algo- 
rithms[18, 22],  assuming  of  course  that  there  are  about  as  many  processors  as  fine  grid  points. 
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